Stanfield Systems Incorporated - VIM Toolkit

VAST 2009 Challenge Challenge 2: Network and Geospatial

Authors and Affiliations:

Tim Jacobs, Stanfield Systems Incorporated, tjacobs@stanfieldsystems.com [PRIMARY Contact]
Delos Ford, Stanfield Systems Incorporated

Tool(s):

The Visual Information Management (VIM) Toolkit is a tool developed in-house at Stanfield Systems for the purpose of having a toolkit to apply various visuazliation methods in a generic way to various data sources. It can perform a variety of useful and common data processing tasks, and by default can use virtually any common data format; this includes the ability to easily combine data from disparate sources. The processed data can then be visualized in a variety of ways, with the added benefit of being able to have multiple visualizations on one screen. Additionally, the visualizations displayed by the toolkit are all interactive, using various elements of focus-plus-context-oriented design. More information about the tool and Stanfield Systems is available at this link.

Video:

Click here for full-size video

ANSWERS:


MC2.1: Which of the two social structures, A or B, most closely match the scenario you have identified in the data?

The most likely structure is somewhat like a combination of the two structures... not clearly being more like one or the other.

 


MC2.2:  Provide the social network structure you have identified as a tab delimitated file. It should contain the employee, one or more handler, any middle folks, and the localized leader with their international contacts. What are the Flitter names of the persons involved? Please identify only key connections (not all single links for example) as well as any other nodes related to the scenario (if any) you may have discovered that were not described in the two scenarios A and B above.  Please name the file Flitter.txt and place it in the same directory as your index.htm file.  Please see the format required in the Task Descriptions.

Flitter.txt

 


MC2.3:  Characterize the difference between your social network and the closest social structure you selected (A or B). If you include extra nodes please explain how they fit in to your scenario or analysis. 

This task required some complex data preprocessing as well as complex visualization and analysis. The first attempt was to simply visualize all the nodes and links on a single graph and try to identify the hierarchies presented by the question. Unfortunately, due to the size of the dataset and the tight interconnection of links, a graph visualizing all Flitter users was not helpful in analyzing the data.

This means we had to come up with a way to preprocess the data to narrow it down. Both scenario A and scenario B have similar criteria for the involved players, so it was reasonably easy to create a visualization to look for instances of both. Specifically, the nodes that did not have a number of links that made them potential players in the situation were removed from the graph. However, the number of links that each player has is important to the scenario. Since many nodes were the other side of the links for players that did matter, it was necessary to convey that information differently in the graph. Specifically, another aspect of the pre-processing was to identify the number of links each potentially important individual had and store that as a field for use in the visualization.

The link count field then allowed us to colorize the nodes based on which person in the hierarchy they could potentially be. Then, using the narrowed-down and colorized graph interactively, it was very easy to identify potential candidates for the defecting employee and his/her handlers. All an analyst has to do at that point is go through each potential employee (highlighted red) and check if it has at least 3 links to either a red (potentially employee OR handler, since the link ranges overlap) or magenta node.

Unfortunately, there is a high concentration of users that falls into the range of possibility for a middleman in Scenario A. This means that the entire solution could not be identified/ruled out in a single graph for Scenario A. Scenario B was still completely traceable by then checking each middleman for a link to a white (uncolored) node. This can also be expedited somewhat by increasing the depth to which connected nodes are highlighted (an option available in the toolkit), but due to the tight interconnection of the graph it can only rule out possibilities, not confirm them.

After identifying all potential candidates for Scenario A and Scenario B, each was cross-checked with the raw data to either confirm or rule them out with 100% certainty (or as near as could be attained). This resulted in 0 possible candidates. Generally there were possible handlers, but no possible middlemen.

To confirm that something wasn't slightly off or missed, we then re-checked the graph as well as adjusting the node filters to be looser to see if anything new showed up. Still there were none that matched either of the given hierarchies. There was one that nearly matched scenario B, but no Fearless Leader could be identified.

Due to the scenarios being purposefully worded vaguely, it seemed evident at that point that the lack of something matching the given scenarios was intentional. We then began to search for structures which were similar. Many possibilities were identified for a employee->two handlers situation with either one or two middlemen, but they were too numerous with no criteria to separate the matches in order of likelihood. The final scenario identified which most closely resembled the scenarios presented is one employee->4 middlemen->2 handlers->Fearless Leader. Given the extreme improbability of such a structure occurring by chance given the ~6000 nodes and ~30000 links, as well as the complete lack of situations fitting the given scenarios A and B, we concluded that this structure was the most likely form of the hierarchy for the situation described.

The structure identified also fits with the geospatial criteria given, in that the Fearless Leader has international contacts from all surrounding countries. Beyond that, the geospatial data was not given much credence as support/denial for anything as geographical location beyond the country of residence for each person is essentially meaningless in a situation like this. There is no logical reason the Fearless Leader would be more likely to be in a small city versus a larger one (or vise-versa), and the same holds true for all players involved. Additionally, there is absolutely no guarantee that information gleaned from a social networking site (such as location) has not been purposefully altered or obfuscated since it's completely voluntary. Simply because the social networking links were identified as an important aspect of the scenario does not imply importance or reliability for any other information gleaned from the site.

 


MC2.4:  How is your hypothesis about the social structure in Part 1 supported by the city locations of Flovania? What part(s), if any, did the role of geographical information play in the social network of part one? 

The geospatial information played the limited role of identifying the foreign contacts of the Fearless Leader.

 


MC2.5:  In general, how are the Flitter users dispersed throughout the cities of this challenge? Which of the surrounding countries may have ties to this criminal operation?  Why might some be of more significant concern than others?

There seemed to be no particularly significant pattern to the distribution of flitter users, aside from obvious correlations with population size.